Goto

Collaborating Authors

 gp expert


Streaming Generated Gaussian Process Experts for Online Learning and Control

Yang, Zewen, Zhang, Dongfa, Dai, Xiaobing, Yu, Fengyi, Zhang, Chi, Huang, Bingkun, Sadeghian, Hamid, Haddadin, Sami

arXiv.org Machine Learning

Gaussian Processes (GPs), as a nonparametric learning method, offer flexible modeling capabilities and calibrated uncertainty quantification for function approximations. Additionally, GPs support online learning by efficiently incorporating new data with polynomial-time computation, making them well-suited for safety-critical dynamical systems that require rapid adaptation. However, the inference and online updates of exact GPs, when processing streaming data, incur cubic computation time and quadratic storage memory complexity, limiting their scalability to large datasets in real-time settings. In this paper, we propose a streaming kernel-induced progressively generated expert framework of Gaussian processes (SkyGP) that addresses both computational and memory constraints by maintaining a bounded set of experts, while inheriting the learning performance guarantees from exact Gaussian processes. Furthermore, two SkyGP variants are introduced, each tailored to a specific objective, either maximizing prediction accuracy (SkyGP-Dense) or improving computational efficiency (SkyGP-Fast). The effectiveness of SkyGP is validated through extensive benchmarks and real-time control experiments demonstrating its superior performance compared to state-of-the-art approaches.


Fusion of Gaussian Processes Predictions with Monte Carlo Sampling

Ajirak, Marzieh, Waxman, Daniel, Llorente, Fernando, Djuric, Petar M.

arXiv.org Machine Learning

In science and engineering, we often work with models designed for accurate prediction of variables of interest. Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes. In this paper, we operate within the Bayesian paradigm, relying on Gaussian processes as our models. These models generate predictive probability density functions (pdfs), and the objective is to integrate them systematically, employing both linear and log-linear pooling. We introduce novel approaches for log-linear pooling, determining input-dependent weights for the predictive pdfs of the Gaussian processes. The aggregation of the pdfs is realized through Monte Carlo sampling, drawing samples of weights from their posterior. The performance of these methods, as well as those based on linear pooling, is demonstrated using a synthetic dataset.


Mixtures of Gaussian process experts based on kernel stick-breaking processes

Saikai, Yuji, Dang, Khue-Dung

arXiv.org Artificial Intelligence

Mixtures of Gaussian process experts is a class of models that can simultaneously address two of the key limitations inherent in standard Gaussian processes: scalability and predictive performance. In particular, models that use Dirichlet processes as gating functions permit straightforward interpretation and automatic selection of the number of experts in a mixture. While the existing models are intuitive and capable of capturing non-stationarity, multi-modality and heteroskedasticity, the simplicity of their gating functions may limit the predictive performance when applied to complex data-generating processes. Capitalising on the recent advancement in the dependent Dirichlet processes literature, we propose a new mixture model of Gaussian process experts based on kernel stick-breaking processes. Our model maintains the intuitive appeal yet improve the performance of the existing models. To make it practical, we design a sampler for posterior computation based on the slice sampling. The model behaviour and improved predictive performance are demonstrated in experiments using six datasets.


Mixtures of Gaussian Process Experts with SMC$^2$

Härkönen, Teemu, Wade, Sara, Law, Kody, Roininen, Lassi

arXiv.org Artificial Intelligence

Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexity by allowing inference based on smaller, local covariance matrices. Moreover, mixtures of Gaussian process experts substantially enrich the model's flexibility, allowing for behaviors such as non-stationarity, heteroscedasticity, and discontinuities. In this work, we construct a novel inference approach based on nested sequential Monte Carlo samplers to simultaneously infer both the gating network and Gaussian process expert parameters. This greatly improves inference compared to importance sampling, particularly in settings when a stationary Gaussian process is inappropriate, while still being thoroughly parallelizable.


Incremental Ensemble Gaussian Processes

Lu, Qin, Karanikolas, Georgios V., Giannakis, Georgios B.

arXiv.org Machine Learning

Belonging to the family of Bayesian nonparametrics, Gaussian process (GP) based approaches have well-documented merits not only in learning over a rich class of nonlinear functions, but also in quantifying the associated uncertainty. However, most GP methods rely on a single preselected kernel function, which may fall short in characterizing data samples that arrive sequentially in time-critical applications. To enable {\it online} kernel adaptation, the present work advocates an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an {\it ensemble} of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary. With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with {\it scalability}, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions. Further, the novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner. To benchmark the performance of IE-GP and its dynamic variant in the adversarial setting where the modeling assumptions are violated, rigorous performance analysis has been conducted via the notion of regret, as the norm in online convex optimization. Last but not the least, online unsupervised learning for dimensionality reduction is explored under the novel IE-GP framework. Synthetic and real data tests demonstrate the effectiveness of the proposed schemes.


Trust Your Robots! Predictive Uncertainty Estimation of Neural Networks with Sparse Gaussian Processes

Lee, Jongseok, Feng, Jianxiang, Humt, Matthias, Müller, Marcus G., Triebel, Rudolph

arXiv.org Artificial Intelligence

This paper presents a probabilistic framework to obtain both reliable and fast uncertainty estimates for predictions with Deep Neural Networks (DNNs). Our main contribution is a practical and principled combination of DNNs with sparse Gaussian Processes (GPs). We prove theoretically that DNNs can be seen as a special case of sparse GPs, namely mixtures of GP experts (MoE-GP), and we devise a learning algorithm that brings the derived theory into practice. In experiments from two different robotic tasks -- inverse dynamics of a manipulator and object detection on a micro-aerial vehicle (MAV) -- we show the effectiveness of our approach in terms of predictive uncertainty, improved scalability, and run-time efficiency on a Jetson TX2. We thus argue that our approach can pave the way towards reliable and fast robot learning systems with uncertainty awareness.


Leveraging Probabilistic Circuits for Nonparametric Multi-Output Regression

Yu, Zhongjie, Zhu, Mingye, Trapp, Martin, Skryagin, Arseny, Kersting, Kristian

arXiv.org Machine Learning

DN), thus, limiting their use to moderately sized data sets. To enable posterior inference in GPs on large-scale problems, Inspired by recent advances in the field of expertbased recent work (see e.g. Liu et al. [2020] for a detailed approximations of Gaussian processes (GPs), review) mainly resorts to global approximations to the posterior, we present an expert-based approach to large-scale e.g., using inducing points, or local approximations multi-output regression using single-output GP that aim to distribute the computation of the posterior distribution experts. Employing a deeply structured mixture onto local experts. Unfortunately, most of these of single-output GPs encoded via a probabilistic approaches only focus on single-output regression, i.e., the circuit allows us to capture correlations between dependent variable is univariate, and in the case of local multiple output dimensions accurately. By recursively approximations, do not easily extend to multi-output regression partitioning the covariate space and the output tasks, see Bruinsma et al. [2020] for a detailed space, posterior inference in our model reduces to discussion on recent techniques on multi-output GPs.


Modulating Scalable Gaussian Processes for Expressive Statistical Learning

Liu, Haitao, Ong, Yew-Soon, Jiang, Xiaomo, Wang, Xiaofang

arXiv.org Machine Learning

For a learning task, Gaussian process (GP) is interested in learning the statistical relationship between inputs and outputs, since it offers not only the prediction mean but also the associated variability. The vanilla GP however struggles to learn complicated distribution with the property of, e.g., heteroscedastic noise, multi-modality and non-stationarity, from massive data due to the Gaussian marginal and the cubic complexity. To this end, this article studies new scalable GP paradigms including the non-stationary heteroscedastic GP, the mixture of GPs and the latent GP, which introduce additional latent variables to modulate the outputs or inputs in order to learn richer, non-Gaussian statistical representation. We further resort to different variational inference strategies to arrive at analytical or tighter evidence lower bounds (ELBOs) of the marginal likelihood for efficient and effective model training. Extensive numerical experiments against state-of-the-art GP and neural network (NN) counterparts on various tasks verify the superiority of these scalable modulated GPs, especially the scalable latent GP, for learning diverse data distributions.


Ultra-fast Deep Mixtures of Gaussian Process Experts

Etienam, Clement, Law, Kody, Wade, Sara

arXiv.org Machine Learning

Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, and sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models. In the present article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). This combination provides a flexible, robust, and efficient model which is able to significantly outperform competing models. We furthermore consider efficient approaches to computing maximum a posteriori (MAP) estimators of these models by iteratively maximizing the distribution of experts given allocations and allocations given experts. We also show that a recently introduced method called Cluster-Classify- Regress (CCR) is capable of providing a good approximation of the optimal solution extremely quickly. This approximation can then be further refined with the iterative algorithm.


Enriched Mixtures of Gaussian Process Experts

Gadd, Charles W. L., Wade, Sara, Boukouvalas, Alexis

arXiv.org Machine Learning

Mixtures of experts probabilistically divide the input space into regions, where the assumptions of each expert, or conditional model, need only hold locally. Combined with Gaussian process (GP) experts, this results in a powerful and highly flexible model. We focus on alternative mixtures of GP experts, which model the joint distribution of the inputs and targets explicitly. We highlight issues of this approach in multi-dimensional input spaces, namely, poor scalability and the need for an unnecessarily large number of experts, degrading the predictive performance and increasing uncertainty. We construct a novel model to address these issues through a nested partitioning scheme that automatically infers the number of components at both levels. Multiple response types are accommodated through a generalised GP framework, while multiple input types are included through a factorised exponential family structure. We show the effectiveness of our approach in estimating a parsimonious probabilistic description of both synthetic data of increasing dimension and an Alzheimer's challenge dataset.